
FASTER DISCRETE COSINE TRANSFORMS USING SCALED TERMS 

CROSS-REFERENCE TO RELATED APPLICATION 
This application is related to the following co-pending and connmonly- 
5 assigned patent applications, which are hereby incorporated herein by reference in 
their respective entirety: 

"FASTER TRANSFORMS USING SCALED TERMS" to Trelewicz et.al., 
having attorney docket no. BLD9-2000-0059US1 . 

"FASTER TRANSFORMS USING EARLY ABORTS AND PRECISION 
10 REFINEMENTS" to Mitchell et al., having attorney docket no. BLD9-2000-0064US1 . 



BACKGROUND OF THE INVENTION 

15 1 . Field of the Invention . 

This invention relates in general to data processing, and more particularly to 
faster discrete cosine transforms using scaled terms. 

2. Description of Related Art . 
20 Transforms, which take data from one domain (e.g., sampled data) to another 

(e.g., frequency space), are used in many signal and/or image processing 
applications. Such transforms are used for a variety of applications, including, but 
not limited to data analysis, feature identification and/or extraction, signal correlation. 
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data compression, or data embedding. Many of these transforms require efficient 
implementation for real-time and/or fast execution. 

Data compression is desirable in many data handling processes, where too 
much data is present for practical applications using the data. Commonly, 
5 compression is used in communication links, to reduce transmission time or required 
bandwidth. Similarly, compression is preferred in data storage systems, including 
digital printers and copiers, where "pages" of a document to be printed may be 
stored temporarily in memory. Here the amount of media space can be substantially 
reduced with compression. Generally speaking, scanned images, i.e., electronic 

10 representations of hard copy documents, are often large, and thus make desirable 
candidates for compression. 

It is well-known in the art to use a discrete cosine transform for data 
compression. In particular examples referred to herein, the terms images and image 
processing will be used. However, those skilled in the art will recognize that the 

1 5 present invention is not meant to be limited to processing images but is applicable to 
processing different data, such as audio data, scientific data, image data, etc. 

In combination with other techniques, such as color subsampling, 
quantization, Huffman coding and run-length coding, the discrete cosine transform 
can compress a digital color image by a factor of approximately thirty to one with 

20 virtually no noticeable image degradation. Because of its usefulness in data 
compression, the discrete cosine transform is an integral part of several data 
compression standards used by international standards committees such as the 
International Standards Organization. 
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DCT (Discrete Cosine Transform), disseminated by the JPEG committee, is a 
lossy compression system whicli reduces data redundancies based on pixel to pixel 
correlations. Generally, an image does not change very much on a pixel to pixel 
basis and therefore has what is known as "natural spatial correlation". In natural 
5 scenes, correlation is generalized, but not exact. Noise makes each pixel somewhat 
different from its neighbors. Moreover, signal and data processing frequently needs 
to convert the input data into transform coefficients for the purposes of analysis. 
Often only a quantized version of the coefficients are needed (e.g., JPEG/MPEG 
data compression or audio/voice compression). Many such applications need to be 

10 done fast in real time such as the generation of JPEG data for high speed printers. 

Generally, an example of a JPEG DCT compression and decompression 
system may be had by referencing the Encyclopedia of Graphics File Formats, by J. 
D. Murray and W. vanRyper, pp. 159-171 (1994, O'Reilly & Associates, Inc.). 
Further description of the draft JPEG standard may be found, for example, in "JPEG 

15 Still Image Data Compression Standard," by W. Pennebaker and J. Mitchell, 1993 
(Van Nostrand Reinhold, New York) or "Discrete Cosine Transform: Algorithms, 
Advantages and Applications," by K. Rao and P. Yip, 1990 (Academic Press, San 
Diego). 

The two-dimensional discrete cosine transform is a pair of mathematical 
20 equations that transforms one N1XN2 array of numbers to or from another N1XN2 

array of numbers. The first array typically represents a square NxN array of spatially 
determined pixel values which form the digital image. The second array is an array 
of discrete cosine transform coefficients which represent the image in the frequency 
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domain. This method of representing the image by the coefficients of its frequency 
components is a special case of the discrete Fourier transform. The discrete Fourier 
transform is the discrete version of the classic mathematical Fourier transform 
wherein any periodic waveform may be expressed as a sum of sine and cosine 
waves of different frequencies and amplitudes. The discrete cosine transform, like 
the Fourier transform, is thus a transform which transforms a signal from the time 
domain into the frequency domain and vice versa. With an input image, A, the 
coefficients for the output "image," B, are: 



B{kM=^J]4AiiJ)cos 

1=0 7=0 



^(2/ + l) 

2N, 



cos 



-(2J + 1) 
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For an image, the input is N2 pixels wide by Ni pixels high; A(i, j) is the 
intensity of the pixel in row i and column j; B(ki,k2) is the DCT coefficient in row ki 
and column kz of the DCT matrix. All DCT multiplications are real. This lowers the 
number of required multiplications, as compared to the discrete Fourier transform. 
For most images, much of the signal energy lies at low frequencies; these appear in 
the upper left corner of the DCT. The lower right values represent higher 
frequencies, and are often small enough to be neglected with little visible distortion. 

There are two basic discrete cosine transform equations. The first basic 
equation is the fonward discrete cosine transform which transforms the pixel values 
into the discrete cosine transform coefficients. The second basic equation is the 
inverse discrete cosine transform which transforms the discrete cosine transform 
coefficients back into pixel values. Most applications of the discrete cosine 
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transform for images use eight-by-eight arrays wherein N therefore has a value of 
eight. Assuming then that N has the value of eight when performing the transforms, 
where f(i, j) are the values of the pixel array and F(u, v) are the values of the discrete 
cosine transform coefficients, the equations of the discrete cosine transforms are as 
follows. 

The formula for the 2D discrete cosine transform is given by: 



CC 



7 7 



^ (2x + \)u/r\ ( {2y + ijv^r 



16 



cos 



16 



10 where x, y = spatial coordinates in the spatial domain (0, 1, 2, ...7); u, v = 

coordinates in the transform domain (0, 1, 2, ....7); Cu = ^ for u = 0, otherwise 1; 

and Cv = for V = 0, otherwise 1 . The separable nature of the 2D DCT is 

V2 

exploited by performing a 1D DCT on the eight columns, and then a 1D DCT on the 
eight rows of the result. Several fast algorithms are available to calculate the 8-point 
15 1DDCT. 

As described above, a DCT compressor comprises mainly two parts. The 
first part transforms highly correlated image data into weakly correlated coefficients 
using a DCT transform and the second part performs quantization on coefficients to 
reduce the bit rate for transmission or storage. However, the computational burden 
20 in performing a DCT is demanding. For example, to process a one-dimensional 
DCT of length 8 pixels requires 13 multiplications and 29 additions in currently 
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known fast algorithms. As stated above, the image is divided into square blocks of 
size 8 by 8 pixels, 16 by 16 pixels or 32 by 32 pixels. Each block is often processed 
by the one-dimensional DCT in row-by-row fashion followed by column-by-column. 
On the other hand, different block sizes are selected for compression due to different 
5 types of input and different quality requirements on the decompressed data. 

In the article, "A Fast DCT-SQ Scheme for Images," Trans. lEICE, Vol. E-71 , 
No. 11, pp. 1095-1097, Nov. 1988, Y. Aral, T. Agui, and M. Nakajima proposed that 
many of the DCT multiplications can be formulated as scaling multipliers to the DCT 
coefficients. The DCT after the multipliers are factored out is called the scaled DCT. 
10 The scaled DCT is still orthogonal but no longer normalized, whereas the scaling 
factors may be restored in a subsequent quantization process. Aral, et al. have 
demonstrated in their article that only 5 multiplications and 29 additions are required 
in processing an 8-point scaled DCT. 



1 5 because more than half of the time in the JPEG encoder is spent in the FonA/ard 
Discrete Cosine Transform (FDCT) code calculating the two-dimensional (2-D) 8x8 
block of 8-bit or 12-bit samples. Currently, the 2-D FDCT is calculated by first 
calculating eight 1-D horizontal DCTs and then calculating another eight 1-D vertical 
DCTs using the currently fastest known 1-D DCT, which was first described by Aral, 

20 Agui, and Nakajima, as mentioned above. The current process takes 29 additions 
and 5 multiplication to calculate a scaled version of the 1-D FDCT. The scaling 
constants are the same for each vertical column and can finally be included in the 
quantization step. This prior solution saved eight multiplications per 1-D FDCT. 



However, there is a need to further increase the speed of the encoder 
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However, as stated above, it can be seen that there is a need to provide a faster 
DCT transform. 

It can also be seen that there is a need to provide a method and apparatus for 
performing discrete cosine transforms with less addition and multiplication steps to 
increase throughput of an encoder. 
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SUMMARY OF THE INVENTION 



To overcome the limitations in the prior art described above, and to overcome 
other limitations that will become apparent upon reading and understanding the 
present specification, the present invention discloses a faster discrete cosine 
5 transform that uses scaled terms. 

The present invention solves the above-described problems by decreasing 
the number of multiplications by using scaling terms on the transform constants. 
Further, after scaling, the terms of the constants are replaced by a series of fewer 
linear shifts and additions since the constants may be approximated by sums of 
1 0 powers-of-2 after only a few summations. Those skilled in the art will recognize that 
throughout this specification, the term 'matrix" is used in both its traditional 
mathematical sense and also to cover all hardware and software systems which 
when analyzed could be equivalently represented as a mathematical matrix. 



1 5 arranging discrete cosine transform equations into at least one collection having at 
least two discrete cosine transform constants, scaling the discrete cosine transform 
equations in the at least one collection by dividing each of the discrete cosine 
transform constants in the collection by one of the discrete cosine transform constants 
from the at least one collection and representing each of the scaled discrete cosine 

20 transform constants with estimated scaled discrete cosine transform constants 
approximated by sums of powers-of-2. 

Other embodiments of a method in accordance with the principles of the 
invention may include alternative or optional additional aspects. One such aspect of 



A method in accordance with the principles of the present invention includes 
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the present invention is that the method further includes separating an image into at 
least one block and transforming the block into transformed data by performing matrix 
multiplication on the discrete cosine transform equations based upon binary arithmetic 
using the estimated scaled discrete cosine transform constants and performing linear 
5 shifts and additions. 

Another aspect of the present invention is that the scaling the discrete cosine 
transform equations in the at least one collection by dividing each of the discrete 
cosine transform constants in the at least one collection by one of the discrete cosine 
transform constants from the at least one collection saves multiplications. 
1 0 Another aspect of the present invention is that the discrete cosine transform 

constant chosen for scaling the discrete cosine transform equations in the at least one 
collection is selected according to a predetermined cost function. 

Another aspect of the present invention is that the cost function minimizes a 
number of add operations. 
1 5 Another aspect of the present invention is that the cost function minimizes a 

worst case number of add operations. 

Another aspect of the present invention is that the cost function minimizes an 
error per constant resulting from the approximations. 

Another aspect of the present invention is that the transforming the block into 
20 transformed data further comprises generating at least one set of one dimensional 
discrete cosine transform equations. 
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Another aspect of the present invention is that the discrete cosine transfornn 
constants are obtained by splitting the discrete cosine transform constants into even 
and odd terms by obtaining sums and differences of input samples. 

Another aspect of the present invention is that the block is an N1XN2 block. 

Another aspect of the present invention is that Ni = N2 = 8. 

In another embodiment of the present invention, a data compression system 
is provided. The data compression system includes a discrete cosine transformer 
for applying a discrete cosine transform to decorrelate data into discrete cosine 
transform equations, the discrete cosine transform equations having been formed by 
arranging the discrete cosine transform equations into at least one collection having 
at least two discrete cosine transform constants, scaling the discrete cosine 
transform equations in the at least one collection by dividing each of the discrete 
cosine transform constant in the collection by one of the discrete cosine transform 
constants from the at least one collection and representing each of the scaled 
discrete cosine transform constants with estimated scaled discrete cosine transform 
constants approximated by sums of powers-of-2 and a quantizer for quantizing the 
transformed data into quantized data to reduce the number of bits needed to 
represent the transform coefficients. 

In another embodiment of the present invention, a printer is provided. The 
printer includes a memory for storing data, a processor for processing the data to 
provide a compressed print stream output and a printhead driving circuit for 
controlling a printhead to generate a printout of the data, wherein the processor 
applies a discrete cosine transform to decorrelate data into transform coefficients 
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using discrete cosine equations, the discrete cosine transform equations having 
been fornned by arranging the discrete cosine transform equations into at least one 
collection having at least two discrete cosine transform constants, scaling the 
discrete cosine transform equations in the at least one collection by dividing each of 
the discrete cosine transform constant in the collection by one of the discrete cosine 
transform constants from the at least one collection and representing each of the 
scaled discrete cosine transform constants with estimated scaled discrete cosine 
transform constants approximated by sums of powers-of-2. 

In another embodiment of the present invention, an article of manufacture is 
provided. The article of manufacture includes a program storage medium readable 
by a computer, the medium tangibly embodying one or more programs of 
instructions executable by the computer to perform a method for performing faster 
discrete cosine transform, the method includes arranging discrete cosine transform 
equations into at least one collection having at least two discrete cosine transform 
constants, scaling the discrete cosine transform equations in the at least one 
collection by dividing each of the discrete cosine transform constant in the collection 
by one of the discrete cosine transform constants from the at least one collection 
and representing each of the scaled discrete cosine transform constants with 
estimated scaled discrete cosine transform constants approximated by sums of 
powers-of-2. 

In another embodiment of the present invention, a data analysis system is 
provided. The data analysis system includes a memory for storing the discrete 
cosine transform equations having been formed by arranging the discrete cosine 
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transform equations into at least one collection having at least two discrete cosine 
transfornn constants, scaling the discrete cosine transfornn equations in the at least 
one collection by dividing each of the discrete cosine transform constant in the 
collection by one of the discrete cosine transform constants from the at least one 
collection and representing each of the scaled discrete cosine transform constants 
with estimated scaled discrete cosine transform constants approximated by sums of 
powers-of-2 and a transformer for applying the transform equations to perform a 
discrete cosine transform to decorrelate data into discrete cosine transform 
coefficients. 

These and various other advantages and features of novelty which characterize 
the invention are pointed out with particularity in the claims annexed hereto and form a 
part hereof. However, for a better understanding of the invention, its advantages, and 
the objects obtained by its use, reference should be made to the drawings which form 
a further part hereof, and to accompanying descriptive matter, in which there are 
illustrated and described specific examples of an apparatus in accordance with the 
invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Referring now to the drawings in which like reference numbers represent 
corresponding parts throughout: 

Fig. 1 illustrates a block diagram of a JPEG encoder; 

Fig. 2 illustrates a table that shows the terms C2 and C6; 

Fig. 3 is a table that shows the actual values if either C2 or 06 is used to 
scale the coefficient matrix and the estimates are a ratio of integers; 

Fig. 4 shows all possible combinations of dividing the coefficient matrix by 01 , 
03, 05. and 07; 

Fig. 5 shows the actual values when 05 is used to scale the coefficient 

matrix; 

Fig. 6 shows the actual values when 07 is used to scale the coefficient 

matrix; 

Fig. 7 shows the number of binary adds to generate the unsealed cosine 
terms within one percent error; 

Fig. 8 illustrates a flow chart of the method to design the fast DOT using 
scaled terms according to the present invention; 

Fig. 9 illustrates a printer according to the present invention; 

Fig. 10 illustrates a data analyzing system according to the present invention; 

and 

Fig. 1 1 illustrates another data analyzing system according to the present 
invention. 
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DETAILED DESCRIPTION OF THE INVENTION 



In the following description of the exemplary embodinnent, reference is made 
to the accompanying drawings which form a part hereof, and in which is shown by 
way of illustration the specific embodiment in which the invention may be practiced. 
5 It is to be understood that other embodiments may be utilized as structural changes 
may be made without departing from the scope of the present invention. 

The present invention provides a faster discrete cosine transform that uses 
scaled terms. The present invention decreases the number of multiplications by 
using scaled terms for the transform constants. Further, after scaling, the ratios of 
10 the constants are replaced by a series of linear shifts and additions since the 
constants may be approximated by sums of powers-of-2 after only a few 
summations. 

Fig. 1 illustratesablockdiagramofa JPEG encoder 100. In Fig. 1, digital 
image data 1 10 is divided up into 8 by 8 pixel blocks 112. Then, the discrete cosine 
1 5 transform (DCT) of each block is calculated 1 20. The discrete cosine transform 
(DOT) helps separate the data into parts (or spectral sub-bands) of differing 
importance (e.g., with respect to an image's visual quality). The DCT is similar to the 
discrete Fourier transform: it transforms a signal from the spatial domain to the 
frequency domain. 

20 A quantizer 130 rounds off the DCT coefficients according to the quantization 

matrix. This step produces the "lossy" nature of JPEG, but allows for large 
compression ratios. There is a tradeoff between image quality and degree of 
quantization. A large quantization step size can produce unacceptably large image 




Page 14 
^LD9-2000^056US1 
^KB501.376US01 
^B^^rit Application 



15 



5 



a 10 



r i i 

2 

h -.h 
z 

rij 15 

El 



20 



distortion. This effect is similar to quantizing Fourier series coefficients too coarsely; 
large distortion would result. Unfortunately, finer quantization leads to lower 
compression ratios. The question is how to quantize the DCT coefficients most 
efficiently. Because of human eyesight's natural high frequency roll-off, these 
frequencies play a less important role than low frequencies. This lets JPEG use a 
much higher step size for the high frequency coefficients, with little noticeable image 
deterioration. The quantizer coefficient output may then be encoded by an optional 
entropy encoder 140 to produce a compressed data stream 150 to an output file. 
Note that for JPEG the entropy encoder is not optional, but other similar data 
compression systems could be designed without the CPU cycles required by the 
entropy encoder. 

However, as described below, there is a need to provide a method and 
apparatus for performing discrete cosine transforms with less multiplication steps to 
increase throughput of the encoder 100. As will be described, the method according 
to the present invention saves multiplications in the brute force equations by scaling 
the coefficient matrix. Each separable subgroup is scaled independently of the other 
collections. Within each collection, the remaining multiplications are replaced by 
simple shifts and adds. The scaling terms are chosen according to various cost 
functions. The preferred embodiment uses cost functions that minimize the number 
of adds and that minimize the worst case number of adds. However, those skilled in 
the art will recognize that alternate cost functions could choose how much error is 
allowed per coefficient. Moreover, those skilled in the art will recognize that the 
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inverse DCT (IDCT) can be implemented using the same method so that the same 
number of operations are used. 

Let the Cn = cosine (n * tt/16). Let f(x) for x = 0 to 7 be the input samples. 
Let F(u) for u=0 to 7 be proportional to the 1-D transform coefficients. Let S(u) for 
u=0 to 7 be equal to the 1-D transform coefficients. Let sxy be f(x) + f(y) and dxy be 
^(x) - f(y) for x, y = 0 to 7. Note the calculations for F(u) below ignore some 
constants that end up in the quantization terms. The first step in the 1-D DCT is to 
split the coefficients into the even and odd terms by taking sums and differences of 
the eight input samples. 



s07 = f(0) + f(7) 
s16 = f(1) + f(6) 
s25 = f(2) + f(5) 
s34 = f(3) + f(4) 

15 d07=f(0)-f(7) 

d16 = f(1)-f(6) 
d25 = f(2) - f(5) 
d34 = f(3) - f(4) 

20 Then, the sums and differences of the sums are calculated: 

S0734 = s07 + s34 
Sl625 = s16 + s25 
d0734 = s07 - s34 
d1625 = s16-s25 

Finally we take sums and differences of the sums again to calculate F(0) and F(4). 



F(0) = s0734 + s1625 
F(4) = s0734-s1625 
30 2 *S(0) = C4(s0734 + si 625) 

2*S(4) = C4(s0734-s1625) 
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Those skilled in the art will recognize that the equations above have been 
derived similar to the Aral, Agui, and Nakajima algorithm. The F(2) and F(6) can be 
calculated from these terms as: 



5 F(2) = C2 * d0734 + C6 * dl625 

F(6) = C6 * d0734 - C2 * d1625 
2 *S(2) = C2 * d0734 + C6 * d1625 
2 *S(6) = C6 * d0734 - C2 * d1625 

10 Fig. 2 illustrates a table 200 that shows the C2 and C6 terms divided by C2 or 

C6. C2 is equal to 0.923880, C2/C2 is equal to 1 .000000, C2/C6 is equal to l+V? , 

C6 is equal to 0.382683, C6/C2 is equal to and C6/C6 is equal to 1.000000. 

1 + V2 

Next, the utility of scaling the constants in the matrix by C6 or C2 is investigated. 
Fig. 3 is a table 300 that shows the actual values 310 if either C2 or C6 is 
1 5 used to scale the constants in the matrix and the estimates 31 2 are a ratio of 
integers. Fig. 3 shows the error 314 due to each of the approximations. C2 was 
chosen because the greatest error occurs in the harmonic with the least power ( or 
the smallest contribution to the overall output ). 

Since F(2) is much more likely to occur, it is now exactly calculated. So the 
20 equations become: 



F(2) = 12 * d0734 + 5 * d1625 = d1625 + (d0734 + d1625 + (d0734«1))«2 
F(6) = 5 * d0734 - 12 * d1625 = d0734 + (d0734-d1625 - (d1625«1))«2 
(24/C2) * S(2) = d1625 + (d0734 + d1625 + (d0734«1))«2 
25 (24/C2) * S(6) = d0734 + (d0734 - d1625 - (d1625«1 ))«2 



Equation 1 shows the remaining equations for the odd constants expressed in 
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In Equation 1, F(1), F(3), F(5) and F(7) are a vector related to unquantized 1- 
D DCT transform coefficients. Note: for u = (1 , 3, 5, 7), 2 * S(u) = F(u). If the 
solutions for these four coefficients are found using a "brute force" approach, then 
12 additional additions and 16 multiplications would be necessary. However, if the 
matrix of coefficients is divided by one of the constants in the matrix, then the 
number of multiplications necessary is reduced by four. This linear system is shown 
in Equation 2 below. 
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Equation 2 



15 Equation 2 shows that the linear set now only involves 12 additions, 12 

multiplications and one division. However, the constant C5 in the division can be 
included in the quantization numbers once per image. Meanwhile, the benefits of 
only 12 multiplications instead of 16 are reaped every 8 by 8 block of samples. 
Furthermore, when the constants of the matrix were scaled by one of the terms, all 

20 of the terms became approximations of sums of powers-of-2 within a few successive 
approximations. This is very beneficial because a multiplication which normally is 
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considered to necessitate more than 5 CPU cycles now becomes a shift and add 
with a maximum of 3 shifts and 2 additions. Fig. 4 shows all possible combinations 
of dividing the constants in the matrix by C1 , C3. C5, and C7. 

Below, the utility of scaling the constants in the matrix by C5 or by C7 is 
investigated. Fig. 5 is a table 500 showing the actual values 510 if C5 is used to 
scale the constants of the matrix. The estimates 512 are in sums of powers-of-2, 
and the error 514 due to the approximation is also stated. C5 was chosen because 
of the continuity in the terms. It was also chosen because the greatest error occurs 
in the harmonic with the least power ( or the smallest contribution to the overall 
output ). The constant C7 is also a good candidate because the calculations fit 
inside of 16 bits. This optimization is shown in table 600 of Fig. 6. 

Those skilled in the art will recognize the above scaling is not meant to be an 
optimal solution for minimizing error. Nevertheless, an exhaustive search may be 
performed to find the optimal number to scale the matrix. Also, the scaling factor 
need not be a function of cos(n * tt/16). As shown for the C2 and C6 cases, the 
best answer was a ratio of integers. For comparison purposes, Fig. 7 is a table 700 
showing the number of binary adds and used to calculate the unsealed cosine terms 
within one percent error. 

In summary, the scaled terms approach fundamentally decreases the number 
of multiplications because the scaling on the quantization values is done once per 
image. However, the benefits are reaped every 8x8 block, or more generally, N1XN2 
block. Second, after scaling, the ratios of coefficients are very nearly sums of 
powers-of-2 after only a few summations. Thus, multiplications can be replaced by a 
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series of linear shifts and additions. In some cases, these terms may be further 
converted into integer terms. This will provide superior hardware and very fast 
software on machines where multiplications are costly (in terms of CPU cycles). 
Fig. 8 illustrates a flow chart 800 of the method used to design the fast DCT 
5 method using scaled terms according to the present invention. In Fig. 8, the one 
dimensional transform coefficients equations for the DCTs are obtained 810. To 
obtain the coefficients, the coefficients are split into even and odd terms by obtaining 
sums and differences with the eight input samples, which are then used to calculate 
the one dimensional transform coefficients. 

10 After the one dimensional coefficients are obtained, the constants in the 1-D 

transform matrix are scaled by dividing by one of the constants 820. This reduces 
the number of multiplications, for an 8x8 block, by four. After scaling, the ratios of 
constants are very nearly sums of powers-of-2 after only a few summations Thus, 
multiplications can be replaced by a series of linear shifts and additions 830. 

1 5 Fig. 9 illustrates a block diagram 900 of a printer 920 according to the present 

invention. In Fig. 9, the printer 920 receives data 912 from a host processor 910. 
The data 912 is provided into memory 930 where the data 912 may be arranged into 
blocks. The blocks are then processed by a processor 940, such as a raster image 
processor. The processor 940 provides a compressed print stream representing the 

20 data to a printhead driving circuit 950. The printhead driving circuit 950 then controls 
the printhead 960 to generate a printout 970 of the data. 

The algorithm designed with reference to Fig. 8 may be tangibly embodied in 
a computer-readable medium or carrier 990, e.g. one or more of the fixed and/or 
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removable data storage devices illustrated in Fig. 9, or other data storage or data 
communications devices. The computer program may be loaded into the memory 
942 to configure the processor 940 of Fig. 9, for execution. The computer program 
comprise instructions which, when read and executed by the processor 940 of Fig. 
5 9, causes the processor 940 to perform the steps necessary to execute the steps or 
elements of the present invention. 

Fig. 10 illustrates a data analyzing system 1000 according to the present 
invention. In Fig. 10, a discrete cosine transform 1010 receives a block of data 1012 
to be analyzed. The discrete cosine transform 1010 uses discrete cosine transform 

10 equations 1020 to generate transformed data 1024. Prior to execution, the discrete 
cosine transform equations 1020 are split into at least one sub-transform having at 
least two transform constants. The at least two transform constants for each 
collection are independently scaled with a scaling term to maintain a substantially 
uniform ratio between the at least two transform constants within the at least one 

1 5 collection, wherein the scaling term is chosen according to a predetermined cost 
function. The discrete cosine transformed data 1024 may then be quantized by an 
optional quantizer 1030. 

Fig. 1 1 illustrates another data analyzing system 1 100 according to the 
present invention. In Fig. 1 1 , a discrete cosine transform 1110 receives a block of 

20 data 1 1 12 to be analyzed. The discrete cosine transform 1110 uses discrete cosine 
transform equations 1 120 to generate transformed data 1 124. Prior to execution, 
the discrete cosine transform equations 1 120 are split into at least one sub- 
transform having at least two transform constants. The at least two transform 
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constants for each collection are independently scaled with a scaling term to 
maintain a substantially uniform ratio between the at least two transform constants 
within the at least one collection, wherein the scaling term may be chosen according 
to a predetermined cost function. The discrete cosine transformed data 1 124 may 
5 then be compared to scaled comparison values in an optional comparator 1 130. 

The foregoing description of the exemplary embodiment of the invention has 
been presented for the purposes of illustration and description. It is not intended to 
be exhaustive or to limit the invention to the precise form disclosed. Many 
modifications and variations are possible in light of the above teaching. It is 
10 intended that the scope of the invention be limited not with this detailed description, 
but rather by the claims appended hereto. 
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